Take-home_Ex2

Author

Flora Phyo Zin Htet

Overview

What are the driving forces behind urban dwellers to weak up early in morning to commute from their home locations to their work places? What are the impact of removing a public bus service on the commuters reside along the corridor of the bus route? These and many other questions related to urban mobility are challenges faced by transport operators and urban managers.

To provide answer to this question, traditionally, commuters survey will be used. However, commuters survey is a very costly, time-consuming and laborous, not to mention that the survey data tend to take a long time to clean and analyse. As a result, it is not unusual, by the time the survey report was ready, most of the information already out-of-date!

As city-wide urban infrastructures such as public buses, mass rapid transits, public utilities and roads become digital, the data sets obtained can be used as a framework for tracking movement patterns through space and time. This is particularly true with the recent trend of massive deployment of pervasive computing technologies such as GPS on the vehicles and SMART cards used by public transport commuters.

Unfortunately, this explosive growth of geospatially-referenced data has far outpaced the planner’s ability to utilize and transform the data into insightful information thus creating an adverse impact on the return on the investment made to collect and manage this data.

Getting Started

Installing and Loading the R Packages

The code chunk below install and load tmap, sf, sp, DT, stplanr, performance, reshape2, ggpubr, units and tidyverse packages into R environment

pacman::p_load(tmap, sf, sp, DT, stplanr,
               performance, reshape2,
               ggpubr, units, tidyverse)

Data Preparation

Working with Aspatial data

Import origin_destination_bus_202310.csv into R by using read_csv() of readr package. The output is R data frame class, pv.

pv <- read_csv("data/aspatial/origin_destination_bus_202310.csv")
Rows: 5694297 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): YEAR_MONTH, DAY_TYPE, PT_TYPE, ORIGIN_PT_CODE, DESTINATION_PT_CODE
dbl (2): TIME_PER_HOUR, TOTAL_TRIPS

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Use head() function to print first few rows of the data frame to quickly inspect the dataset’s structure and content.

head(pv)
# A tibble: 6 × 7
  YEAR_MONTH DAY_TYPE   TIME_PER_HOUR PT_TYPE ORIGIN_PT_CODE DESTINATION_PT_CODE
  <chr>      <chr>              <dbl> <chr>   <chr>          <chr>              
1 2023-10    WEEKENDS/…            16 BUS     04168          10051              
2 2023-10    WEEKDAY               16 BUS     04168          10051              
3 2023-10    WEEKENDS/…            14 BUS     80119          90079              
4 2023-10    WEEKDAY               14 BUS     80119          90079              
5 2023-10    WEEKDAY               17 BUS     44069          17229              
6 2023-10    WEEKENDS/…            17 BUS     20281          20141              
# ℹ 1 more variable: TOTAL_TRIPS <dbl>

A quick check of pv tibble data frame shows that the values in ORIGIN_PT_CODE and DESTINATON_PT_CODE are in numeric data type. Hence, the code chunk below is used to convert these data values into character data type.

pv$ORIGIN_PT_CODE <- as.factor(pv$ORIGIN_PT_CODE) 
pv$DESTINATION_PT_CODE <- as.factor(pv$DESTINATION_PT_CODE)

Extracting the study data

We will extract commuting flows on weekday and between 6 and 9 o’clock.

pv6_9 <- pv %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 &
           TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
`summarise()` has grouped output by 'ORIGIN_PT_CODE'. You can override using
the `.groups` argument.

Table below shows the content of pv6_9

datatable(pv6_9)
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html

Working with Geospatial Data

We will be using the following geospatial data

BusStop: This data provides the location of bus stop as at last quarter of 2022.

busstop <- st_read(dsn = "data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `D:\zhphyo\ISSS624\Take-home_Ex2\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
busstop
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
First 10 features:
   BUS_STOP_N BUS_ROOF_N             LOC_DESC                  geometry
1       22069        B06   OPP CEVA LOGISTICS POINT (13576.31 32883.65)
2       32071        B23         AFT TRACK 13 POINT (13228.59 44206.38)
3       44331        B01              BLK 239  POINT (21045.1 40242.08)
4       96081        B05 GRACE INDEPENDENT CH POINT (41603.76 35413.11)
5       11561        B05              BLK 166 POINT (24568.74 30391.85)
6       66191        B03         AFT CORFE PL POINT (30951.58 38079.61)
7       23389       B02A              PEC LTD   POINT (12476.9 32211.6)
8       54411        B02              BLK 527 POINT (30329.45 39373.92)
9       28531        B09              BLK 536 POINT (14993.31 36905.61)
10      96139        B01              BLK 148  POINT (41642.81 36513.9)

MPSZ-2019: This data provides the sub-zone boundary of URA Master Plan 2019

mpsz <- st_read(dsn = "data/geospatial",
                   layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `D:\zhphyo\ISSS624\Take-home_Ex2\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
mpsz
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
                 SUBZONE_N SUBZONE_C       PLN_AREA_N PLN_AREA_C       REGION_N
1              MARINA EAST    MESZ01      MARINA EAST         ME CENTRAL REGION
2         INSTITUTION HILL    RVSZ05     RIVER VALLEY         RV CENTRAL REGION
3           ROBERTSON QUAY    SRSZ01  SINGAPORE RIVER         SR CENTRAL REGION
4  JURONG ISLAND AND BUKOM    WISZ01  WESTERN ISLANDS         WI    WEST REGION
5             FORT CANNING    MUSZ02           MUSEUM         MU CENTRAL REGION
6         MARINA EAST (MP)    MPSZ05    MARINE PARADE         MP CENTRAL REGION
7                   SUDONG    WISZ03  WESTERN ISLANDS         WI    WEST REGION
8                  SEMAKAU    WISZ02  WESTERN ISLANDS         WI    WEST REGION
9           SOUTHERN GROUP    SISZ02 SOUTHERN ISLANDS         SI CENTRAL REGION
10                 SENTOSA    SISZ01 SOUTHERN ISLANDS         SI CENTRAL REGION
   REGION_C                       geometry
1        CR MULTIPOLYGON (((33222.98 29...
2        CR MULTIPOLYGON (((28481.45 30...
3        CR MULTIPOLYGON (((28087.34 30...
4        WR MULTIPOLYGON (((14557.7 304...
5        CR MULTIPOLYGON (((29542.53 31...
6        CR MULTIPOLYGON (((35279.55 30...
7        WR MULTIPOLYGON (((15772.59 21...
8        WR MULTIPOLYGON (((19843.41 21...
9        CR MULTIPOLYGON (((30870.53 22...
10       CR MULTIPOLYGON (((26879.04 26...
hex_Grid <- st_make_grid(busstop, c(750), what = "polygons", square = FALSE)

# To sf and add grid ID
grid_sf = st_sf(hex_Grid) %>%
  # add grid ID
  mutate(grid_id = 1:length(lengths(hex_Grid)))

# count number of points in each grid
# https://gis.stackexchange.com/questions/323698/counting-points-in-polygons-with-sf-package-of-r
grid_sf$n_busstops = lengths(st_intersects(grid_sf, busstop))

# remove grid without value of 0 (i.e. no points in side that grid)
busstop_grid = filter(grid_sf, n_busstops > 0)

Geospatial data wrangling

Combining Busstop and Hexagon

Code chunk below populates busstop_grid sf data frame into busstop sf data frame.

busstop_hex <- st_intersection(busstop, busstop_grid) %>%
  select(BUS_STOP_N, grid_id) %>%
  st_drop_geometry()
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
datatable(busstop_hex)

We are going to append the hexagon from busstop_hex data frame onto odbus6_9 data frame.

pv_data <- left_join(pv6_9 , busstop_hex,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_SZ = grid_id,
         DESTIN_BS = DESTINATION_PT_CODE)
Warning in left_join(pv6_9, busstop_hex, by = c(ORIGIN_PT_CODE = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 25632 of `x` matches multiple rows in `y`.
ℹ Row 3057 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Check for duplicating records.

duplicate <- pv_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

Duplicated records are found. The code chunk below will be used to retain the unique records.

pv_data <- unique(pv_data)

Confirm if the duplicating records issue has been addressed fully.

duplicate <- pv_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

print(nrow(duplicate))
[1] 0

we will update pv_data data frame with the hexagon.

pv_data <- left_join(pv_data , busstop_hex,
            by = c("DESTIN_BS" = "BUS_STOP_N")) 
Warning in left_join(pv_data, busstop_hex, by = c(DESTIN_BS = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 167 of `x` matches multiple rows in `y`.
ℹ Row 3146 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Check for duplicating records.

duplicate <- pv_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

Duplicated records are found. The code chunk below will be used to retain the unique records.

pv_data <- unique(pv_data)

Confirm if the duplicating records issue has been addressed fully.

duplicate <- pv_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

print(nrow(duplicate))
[1] 0
pv_data <- pv_data %>%
  rename(DESTIN_SZ = grid_id) %>%
  drop_na() %>%
  group_by(ORIGIN_SZ, DESTIN_SZ) %>%
  summarise(MORNING_PEAK = sum(TRIPS))
`summarise()` has grouped output by 'ORIGIN_SZ'. You can override using the
`.groups` argument.

Visualising Spatial Interaction

Prepare a desire line by using stplanr package.

Removing intra-zonal flows

We will not plot the intra-zonal flows. The code chunk below will be used to remove intra-zonal flows.

pv_data1 <- pv_data[pv_data$ORIGIN_SZ!=pv_data$DESTIN_SZ,]

Creating desire lines

In this code chunk below, od2line() of stplanr package is used to create the desire lines.

flowLine <- od2line(flow = pv_data1, 
                    zones = busstop_grid,
                    zone_code = "grid_id")
Creating centroids representing desire line start and end points.

Visualising the desire lines

To visualise the resulting desire lines, the code chunk below is used.

tmap_mode("plot")
tmap mode set to plotting
tmap_options(check.and.fix = TRUE)
tm_shape(mpsz) +
  tm_polygons() +
tm_shape(busstop_grid) +
  tm_borders(col = "grey40", lwd = 0.7)
Warning: The shape mpsz is invalid. See sf::st_is_valid

flowLine %>%  
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3 )
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length
Legend labels were too wide. Therefore, legend.text.size has been set to 0.49. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.

When the flow data are very messy and highly skewed like the one shown above, it is wiser to focus on selected flows, for example flow greater than or equal to 5000 as shown below.

tmap_mode("plot")
tmap mode set to plotting
tmap_options(check.and.fix = TRUE)

tm_shape(mpsz) +
  tm_polygons() +
tm_shape(busstop_grid) +
  tm_polygons() +
flowLine %>%  
  filter(MORNING_PEAK >= 5000) %>%
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3 )
Warning: The shape mpsz is invalid. See sf::st_is_valid
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Legend labels were too wide. Therefore, legend.text.size has been set to 0.49. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.

Spatial Interaction Modelling